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ABSTRACT 

Aims. We study statistically 197 long gamma-ray bursts, detected and measured in detail by the BATSE instrument of 
the Compton Gamma-Ray Observatory. In the sample 10 variables, describing for any burst the time behavior of the 
spectra and other quantities, are collected. 

Methods. The factor analysis method is used to find the latent random variables describing the temporal and spectral 
properties of GRBs. 

Results. The application of this particular method to this sample indicates that five factors and the TZEpy^ spectral 
variable (the ratio of peak energies in the spectrum) describe the sample satisfactorily. Both the pseudo-redshifts 
inferred from the variability, and the Amati-relation in its original form, are disfavored. 
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1. Introduction 

Factor Analysis (FA) and Principal Component Analysis 
(PCA) are powerful statist ical methods i n data analy- 
sis. Using PCA and FA Bagolv et all (|1998D demonstrated 
that the 9 variables typically measured (T50 and Tgo du- 
rations; P64j-F256j and P1024 peak fluxes; Tx^Ti^T'i, and 
.F4 fluences) for gamma-ray bursts (GRBs), observed by 
the BATSE instrument onboard the Compton Gamma- 
Ray Observatory and listed in the Current BATSE Catalog 
(|Meegan et al.ll200ll ). can be satisfactorily represent e d by 3 
hidden statistical variables. iBorgonovo fc BiornssorJ (j2006( ) 
(hereafter BB06) studied the statistical properties of 197 
long GRBs detected by BATSE. They defined 10 statisti- 
cal variables describing the temporal and spectral proper- 
ties of GRBs. By performing a PCA, they concluded that 
about 70 % of the total variance in the parameters were 
explained by the first 3 Principal Components (PCs). The 
aim of this article is to proceed in a similar way to BB06 
by using instead FA. 

By solving the eigenvalue problem of the correlation (co- 
variance) matrix, PCA transforms the observed variables 
into the same number of uncorrelated variables (PCs). An 
essential ingredient of PCA is a distinction between the 
"important" and "less important" variables by taking into 
account the magnitude of the eigenvalues of the correlation 
(covariance) matrix. FA assumes that the observed vari- 



ables can be described by a linear combination of hidden 
variables given by: 



Af 



(1) 



where x denotes an observed variable of dimension p, A is a 
matrix of p x m dimensions (m < p), f represents a hidden 
variable of m dimensions. The components of A are called 
loadings, the factor / represents scores, and e is a noise 
term. We can infer x from observations while the quantities 
on the right-hand-side of Eq. [1] have to be computed by a 
suitable algorithm. 

PCA expresses the x observed variable as a linear trans- 
formation of a hidden variable of the same p dimension, 
whose components are uncorrelated. The transformation 
matrix is set up from the eigenvectors of the correlation 
matrix of x. By retaining only the first m < p eigenvec- 
tors, it can be shown that, the resultant transformation 
matrix provides the best reproduction of x among those 
using only m < p components. By retaining only the first 
m < p eigenvectors, one receives a transformation matrix 
of dimensions p x m and an expression identical to the 
first term on the right side of Eq. [TJ Due to this fact, the 
PCA is a default solution of FA in many statistical pack- 
ages (e. g. SPSgl ; for a detailed comparison of PCA and 
FA, see ljollifiyl2002')). Although PCA is a defauh solu- 
tion in many packages, FA has other algorithms as well. In 
our computations, we use the Max imum Likelihood (ML) 
method (for details see lJolliffel (|200a) ). 
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2. The sample 

We use the sample of 197 long GRBs in BB06 and the 
10 variables defined there. Of the 10 variables, Tgo and T 
were taken directly from the BATSE Catalog. The remain- 
ing 8 variables were calculated by BB06. In summary, the 
10 variables are the following: duration time Tgo, emission 
time 750, autocorrelation function (ACF) half- width r, vari- 
ability y, emission symmetry iSjr, cross-correlation function 
time lag rjag, the ratio of peak energies TZEp]^, fluence T, 
peak energy £^pk, and low frequency spectral index a. 

Since the variables have different dimensions in a similar 
way to BB06 we use the decimal logarithms (except for 
a). The correlations between the variables are indicated 
in Table [T] The choice of the logarithms is motivated by 
the fact that the distributions of most variables are well 
described by log-normal distributions (see the discussion of 
BB06). 

In a similar way to BB06, we do not consider the flu- 
ence on t he highest ch annel (> 300 keV) separately, al- 
though in lBagolv et al.l (|T998) this variable alone was used 
to define a PC (factor). This choice is motivated by two 
reasons: first, fluences on the fourth channel often van- 
ish or have significant errors ( "the values are noisy" ) ; sec- 
ond, as noted by BB06, in a sample of long-soft GRBs 
only, this quantity is less important. It is now certain 
that the long-soft and sh ort-hard bursts are different phe- 
nomena (.Horvath. 1998: Norris et al.. 2001; Horvath, 200l; 



iBalazs et al.l . l2003f). The signifi c ance of the intermediate 
GRBs is unclear (jHorvath et al.l . l2006D . 



3. Estimation of the number of factors 

In contrast to PGA, in FA the choice of the number of hy- 
pothetical (latent) random variables (factors) is - at the be- 
ginning - a free parameter. To determine the optimal num- 
ber of factors, there are no direct metho ds (even the notion 
"best number of factors" is unclear; see iJolliffd (2002)). 

By solving the eigenvalue problem of the correlation ma- 
trix, PGA yields PGs in descending order of the eigenvalue 
magnitudes. To validate a factor model, one retains the first 
m < p PGs, which satisfactorily reproduce the original cor- 
relation matrix. In the ML method, the expected number of 
factors is an input parameter, and the algorithm computes 
the probability that the difference between the original and 
reproduced correlation matrix can be attributed to chance 
only. One stops increasing the number of factors, when this 
probability is already sufficiently large. 

The factor model assumes that a linear transformation 
exists between the observed and the latent (factor) vari- 
ables. The number of unknown parameters (i.e. p (m -I- 1) 
on the right side of Eq. [T]) are constrained by the dimen- 
sion of the covariance matrix ofx (i.e. 1/2 p{p + 1) inde- 
pendent parameters) and the need for factor-loading or- 
thogonality, which provid es 1/2 m{m — 1) free parameters 
(jKendall fc Stuart! Illli)). Thus, the number m of factors 
can be constrained by the following inequality: 



There are several furthe r criter i a that constrains the 
required number of factors (JoUiffd (|2002l ) and references 
therein) . The first additional criterion follows from the "cu- 
mulative percentage of the total variance." Taking into ac- 
count any new factor, the percentage of the variation ex- 
plained by these factors should increase. Then, if one de- 
fines a cut-off percentage, the number of factors m required 
is given by the value factors, when the cumulative variance 
in percentage is already higher than this cut-off percentage. 
There i s no e xact rule about the best value of the cut-off: 
iJolliffd ()2002D proposes to choose a value around 70% - 90%, 
and in addition, if p >> 1%, a smaller value is proposed. 
Hence, in our case the value around 70% seems to be a good 
choice. For PGA and for the correlation matrix, m can also 
be estimated from the eigenvalues of the PGs - PGs with 
eigenvalues larger than 0.7 should be retained. Using FA - 
instead of the PGA - one may also assume that the number 
of factors in general should not be larger th an the numbe r 
of PGs (in most cases it is even smaller) (IJolliffd . [200l . 
The most accurate estimate of the number of factors m is 
therefore a combination of several criteria. 

The advantage of the ML approach is that it helps to 
constrain the value of m, the dimension of the hidden fac- 
tor variables. This is because the ML method provides a 
probability of the null hypothesis, i.e. that the correlation 
matrix of the observed variables and that reproduced by 
the factor solution arc identical from the statistical point 
of view. 

By performing FA on the observed variables assuming 
6 factors, which is the maximum number allowed by Eq. |21 
one observes the validity of the null hypotheses with only 
p = 0.0191, which implies that even the maximum allow- 
able number of factors can't reproduce the original corre- 
lation matrix of the observed variables satisfactorily. Table 
[2] shows the factor coefficients (loadings) of this solution. 

By inspecting Table it becomes obvious that FactorS 
and Factors are dominated by only one variable (log TZE-p\^ 
and a, respectively) and are hardly affected by the other 
variables. Therefore, it appears reasonable to exclude one 
of them and repeat the calculations with the remaining 9 
variables. In this case, the maximum allowable number of 
factors is m = 5, which corresponds to either the null hy- 
potheses p = 0.11, after excluding a, and p = 0.273 af- 
ter excluding logT^.E'pk- We therefore decided to exclude 
log7?.-Epk, and the ML solution assuming m — 5 factors 
is given in Table [31 The cumulative variance, defined by 5 
factors, is 71.9%. This fulfills the "cumulative percentage of 
the total variance" criterion for PGA, considering the cor- 
responding high value of p. This also supports the choice of 
5 factors. 

We have proven that m — 5 factors are sufficient. To 
prove that it is essential, we also performed the ML anal- 
ysis with rn = 4 factors. This calculation resulted only 
p = 0.0044 that 4 factors are sufficient. One can therefore 
conclude that to = 5 factors are necessary and sufficient for 
describing the observed variables. 



< (2p + 1 - v/8p+l)/2 , 



(2) 4. Results and discussion of FA 



which provides to < 6 in our case. Since the number of 
factors is an integer, to = 6 is a maximum value in our 
case. Equation [2] provide the upper limit to the number of 
factors, although the true number remains to be estimated. 



The first factor is constrained by Tgo, T^o, t and i.e. the 
first factor is determined mainly by the temporal proper- 
ties. Hence measures Tso and Tgo are the preferred length 
indicators over r. 
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Variable logTgo 


1 7^ 

log T50 


logr 


log V 




logTlag log 




logj*^ log£;pk 


a 


logTgo 


1.00 


0.78 


0.58 


0.18 


0.09 


-0.01 


-0.15 


0.5 


0.24 


-0.26 


log T50 


0.78 


1.00 


0.87 


0.51 


0.25 


0.09 


-0.21 


0.61 


0.14 


-0.16 


logT 


0.58 


0.87 


1.00 


0.4 


0.24 


0.15 


-0.25 


0.61 


0.14 


-0.12 


logy 


0.18 


0.51 


0.4 


1.00 


0.32 


-0.18 


-0.37 


0.33 


0.08 


-0.07 


10g5jr 


0.09 


0.25 


0.24 


0.32 


1.00 


0.03 


-0.37 


0.07 


-0.23 


0.03 


log riag 


-0.01 


0.09 


0.15 


-0.18 


0.03 


1.00 


0.24 


-0.04 


-0.28 


0.33 


log7^^;pk 


-0.15 


-0.21 


-0.25 


-0.37 


-0.37 


0.24 


1.00 


-0.03 


0.04 


-0.01 


logT 


0.5 


0.61 


0.61 


0.33 


0.07 


-0.04 


-0.03 


1.00 


0.58 


-0.2 


log Epk 


0.24 


0.14 


0.14 


0.08 


-0.23 


-0.28 


0.04 


0.58 


1.00 


-0.28 


a 


-0.26 


-0.16 


-0.12 


-0.07 


-0.03 


0.33 


-0.01 


-0.2 


-0.28 


1.00 



Table 2. ML solution assuming 6 factors. In any column for the given factor the loadings are given (a larger value 
represents higher weight for a given variable); the sum of their squares is denoted by SS loading; the value Proportion 
Var defines the proportion of SS loading to the sum of variances of the input variables; Cumulative Var defines the sum 
of proportional variances. 



Variable 


Factorl 


Factor2 


Factors 


Factor^ 


Factors 


FactorG 


logTgo 


0.418 


0.128 


-0.066 


0.884 


-0.133 


0.017 


log T50 


0.770 


0.022 


-0.087 


0.490 


-0.036 


0.320 


logr 


0.928 


0.038 


-0.158 


0.198 


-0.006 


0.146 


logy 


0.249 


0.063 


-0.225 


0.043 


-0.041 


0.844 


log Sj^ 


0.173 


-0.241 


-0.319 


0.036 


-0.042 


0.252 


log riag 


0.246 


-0.269 


0.235 


-0.008 


0.333 


-0.187 


logTZEpi, 


-0.070 


0.001 


0.981 


-0.050 


0.003 


-0.159 


log^ 


0.564 


0.499 


0.047 


0.226 


-0.066 


0.187 


log -Epk 


0.108 


0.974 


0.054 


0.074 


-0.159 


-0.008 


a 


-0.098 


-0.105 


-0.024 


-0.106 


0.981 


-0.004 


SS loadings 


2.126 


1.363 


1.212 


1.134 


1.126 


0.995 


Proportion Var 


0.213 


0.136 


0.121 


0.113 


0.113 


0.099 


Cumulative Var 


0.213 


0.349 


0.470 


0.584 


0.696 


0.796 



Table 3. ML solution assuming 5 factors after removing the logT^.E'pk variable. Testing the hypothesis that 5 factors 
are sufficient resulted p — 0.273. 



Variable 


Factorl 


Factor2 


Factors 


Factor^ 


Factors 


logTgo 


0.875 


0.009 


0.088 


-0.152 


-0.051 


log T50 


0.895 


0.353 


0.039 


0.026 


0.236 


logr 


0.704 


0.277 


0.090 


0.095 


0.592 


logV 


0.176 


0.973 


0.091 


-0.098 


0.016 


logSr 


0.133 


0.320 


-0.244 


-0.020 


0.141 


log riag 


0.110 


-0.144 


-0.175 


0.490 


0.141 


log^ 


0.528 


0.183 


0.520 


-0.068 


0.245 


log _Epk 


0.146 


-0.060 


0.947 


-0.272 


-0.005 


a 


-0.191 


0.038 


-0.053 


0.730 


-0.100 


SS loadings 
Proportion Var 
Cumulative Var 


2.459 
0.273 
0.273 


1.309 
0.145 
0.419 


1.285 
0.143 
0.561 


0.895 
0.099 
0.661 


0.519 
0.058 
0.719 



The second factor is dominated by H owever, accord- 
ing to Jlamirez-Ruiz & Fenimorc. (j2000f ). iReichart et al.l 
(|2001h . and IGuidorzi et all (|2005l ). the variability should 
be correlated with the luminosities of GRBs, and hence to 
the fluence. No significant connection is, however, inferred 
by the second factor raising queries about the rcdshift es- 
timations derived from variability. 

The third factor is mainly driven by i?pk- It is interest- 
ing that the peak energy in the spectra appears to dominate 
the third factor so significantly. It emphasizes that the spec- 
trum itself is an important quantity (an expected result), 
and, in the spectrum Epy^ itself, is a significant descriptor 
(an unexpected result). In addition, the loading of J- is also 



important to the third factor. All this has a remarkable im- 
pact on the Amati-relation. 

The Amati-relation (|Amati et al.l (|2002l )) proposes that 
there should be a linear connection between log fi'pkiintr and 
log Eiso, where Eiso is the emitted energy under the as- 
sumption of isotropic emission, -Epk;intr = (1 + z)Ep]<, is 
the intrinsic peak energy, and z is the redshift. This rela- 
tio n, which follows fro m the relation £'pk;intr oc Ef^^ found 
by lAmati et al.l (|2002l ) from the analysis of twelve bright 
long GRBs with well-measured redshifts. The most proba- 
ble value of X was around x = 0.5. Thus, the Amati-relation 
- in its original form - claims that a direct linear connection 
exists only between logi?pk;intr and log Eiso- We note that 
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the Amati-relation was predicted even earlier by the strong 
correlation between log J- and logiSpk ('Llovd et alJ. l200df) . 
The importance of the Amati-relation is straightforward: if 
it holds, then it is possible to determine the redshift of the 
given long burst from the value of iJpk alone, because iJpk 
defines Ei^o independently of J-. Then, by applying stan- 
dard cosmology, we can calculate from the known Ejso and 
values the redshift (e.g. Meszaros fc Meszaroi (|1995[ )). 
The validity of the Amati-relation has been a mat- 
ter of intense discussion since publicatio n. Several pa - 
pers confirmed it by n e wer a. nalyses fe.g. lAmatil (|2006D : 
Ghirlanda et al. (2007, 2008) and references therein). 



Cabrera et al, (2007.) confirmed the existence of the i?pk;ii 



Eiso correlation in the rest-frame for 47 Swift GRBs. 
These studies considered bright long GRBs with known red- 
shifts enabling Ei^o to be determined. This causes strong 
selection effect in the studied samples. It is possible that 
this selection effect cause e.g. the entire BATSE sample to 
follow the Amati-relation either only in a modified version 
or even not at all, even though the relati on holds f or the 
truncated sample of bright GRBs (Naka r fc Piranl . l2005t 
iButler et al.l . |2007[ ) . BB06 obtained that it is better to use 
-Epk;i„tr oc Ef^'^T^^^^ with suitable ai and bi for the BATSE 
sample {rintr = t/{1+z)). Hence, if 6i 0, then the Amati- 
relation is altered. BB06 proposes, as the optimal choice, 
bi = —0.3. Some papers even reject the Amati-r elation both 
in the BATS E sample (iNakar fc Piranl . I2005D and in the 
Swift sample ([Butler et al.l 20071 ). The most radical solu- 
tion even challen ges the mean ing of -Bpk intr itself in the 
spectra of GRBs (|Rvd3 . l2005bH . 

For our purposes, it is essential statistically that the 
correlation between log T and log i?pk does not imply that 
there is a linear connection only between log Eiso and 
logi?pk;intr- BB06 also arrived at the conclusion that a re- 
lation of the form 



bi log ' 



Ci 



(3) 



log Eiso = ai log £^pk;intr 

should exist with some suitable non-zero constants ai , 6i , 
and ci. We note that %q and r strongly correlates with 
each other, i.e. in this equation either Tintr or 7i,Q-intr can 
be used. 

The factor loadings imply that log J- is explained basi- 
cally by the first and third factors. Since in Factorl and 
Factors log r and log ii^pk are very strong, respectively, it 
suggests that 

log Eiso = 0.2 log £^pk;intr + ^2 log Untr + C2 log Liso + d (4) 

should hold with some suitable 02 , 62 , C2 , and d non-zero 
constants {Liso is the isotropic peak lumino sity). We note 
that a similar relation was also proposed by iFirmani et al.l 
(|2006D . 

The correlation between log T and log Spk is mainly de- 
termined by Factors. It follows from the loadings of the 
first and third factors that the relationship between logJF 
and log Ep]i is as important as with the variables dominat- 
ing Factorl. This fact disfavors a simple linear relationship 
only between log£^pk:intr and log Eiso. The detailed study 
of Eq. m (cf. determination of a2,b2, C2, d, and alternative 
equations) is beyond the aim of this paper. Even from this 
conclusion, it however follows that the Amati-relation in 
its original form is disfavored and some modified version 
proposed by BB06 is also supported here. 

The fourth factor is defined by low frequency spectral 
index a and riag. This implies that the direct correlation 



between Tiag and V is negligible, and hence there is no direct 
support for the luminosity estimators based on the se two 
variables ( Ramirez- Ruiz & Fenimore, 2000; Reichart et al.l 
l2001l : [NOTrisLl2002f ). 

The fifth factor is dominated by r and T. With the 
first factor this demonstrates that Tgo and %o are not com- 
pletely equivalent, although T^q characterizes a burst more 
closely. 

In our opinion, the most remarkable result is that so few 
quantities are needed, i.e. that all nine quantities can be 
characterized by five variables. Because all of these conclu- 
sions are derived from the measured data alone, all models 
of GRBs must respect these expectations. 

The number of essential variables is in accordance with 
BB06. They claimed that 3-5 PCs should be used, and we 
constrained the number of important quantities to be 5. 



5. Conclusions 

The results of the paper may be summarized as follows. 

— No more than 5 factors should be introduced. This es- 
sential lowering of the significant variables is the key 
result of this paper. 

— The structure of factors is similar to the PCs of BB06. 
The number of important quantities is more accurately 
defined here. 

— The first factor is dependent mainly on the temporal 
variables, and quantities Tso and Tgo are the preferred 
length indicators. 

— The second factor is dominated by the variability. 

— The connection of i^pk in the third factor with other 
quantities, and the structure of the first three factors 
cast some doubts about the Amati-relation in its origi- 
nal form. 

— The a and riag parameter values in fourth factor give 
no direct support for the luminosity estimators. 

— The fifth factor demonstrates that Tgo and %o are not 
completely equivalent. 
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